Writing Documents* with LaTeX, R, and R Markdown

Our End Goal

Our end goal is to turn text and code into very nice looking PDF documents:

What is LaTeX?

LaTeX is a document typsetting language and build system based off TeX

  • Turing-complete
  • Powerful control over formatting
  • Define commands and macros
  • Automate citations

Installing TeX

TeX comes in many different distributions

  • TeX Live: the distribution of choice for most *nix distributions – tug.org/texlive
    • Packages available for most Linux distributions
  • MiKTeX: used most on Windows; has a Windows installer – miktex.org
  • MacTeX: like TeX Live, but more compatibility with OS X – tug.org/mactex

Full installation might be up to ~6 GB

Install more packages from the CTAN (Comprehensive TeX Archive Network)

  • Use tlmgr with TeX Live

Creating a LaTeX Document

Here’s an example LaTeX document:

\documentclass[12pt]{article}
\usepackage{lmodern}
\usepackage{blindtext}

\begin{document}

\section{Sample Document}

Start writing document here. By default there's an indent. This is some \textbf{bolded}
and \textit{italicized} text. Above is the title page, created with specified title
parameters.

\noindent Start a new paragraph, no indent. Break a line: \\ New line! Here's some 
{\small small text}. {\Large Large text.}

% A comment: line break is defined by an empty line
\blindtext

\end{document}

Breaking It Down

\documentclass[options]{class} defines the layout standard

  • options can set main font size, paper size, and column structure
  • class defines standard layout; can be article, report, book, letter, beamer, etc.

\usepackage[options]{pkgname} imports a TeX package and exposes it commands for use

Document

\begin{blockname} begins some block of text and commands

  • blockname may be associated with some command or custom definition
  • \begin{document} defines where the document actually begins
  • \begin{titlepage} declares a page dedicated to the title

\end{blockname} ends a block

\section defines a new section/chapter/etc. block

Commands

\textbf makes text bold; \textit makes text italicized

Changing text size:

  • \small makes text small; \Large, well, large
  • Some other basic font differences, such as \large and \Huge

\blindtext, provided by the blindtext package, produces filler text

These are just a couple basic ones, there are many, many more provided by the core library and packages

Making It Useful

There are a number of TeX compilers that should be included in standard TeX installations.

  • latex - generates a DVI document, but supports only .eps and .ps image formats
  • pdflatex - generates a PDF document; supports .png, .jpg, and .pdf image formats
  • xelatex - like pdflatex, but with robust Unicode and font support
  • lualatex - like xelatex, but can embed Lua code
xelatex example-latex.tex

Ahh! Command Line! Scary!

If you don’t like/understand the command line, there are plenty of web or GUI applications to do compilation for you:

Increasing Complexity

Here’s a slightly more complicated example:

\documentclass[11pt]{article}                             % Declare document class

\usepackage{multicol}                                     % Declare packages used
\usepackage[margin=2.5cm]{geometry}                       % Set geometry options
\usepackage{graphicx,caption}
\usepackage{nonfloat}
\usepackage{fancyhdr}
\usepackage{blindtext}

\newenvironment{colfigure}{                               % Defining a new environment
  {\noindent\minipage{\linewidth}}                        % Commands/text to put before
  {\endminipage\par\bigskip}                              % Commands/text to put after
}

\newcommand{\colgraphic}[2]{%%                            % Defining a custom command
  \begin{colfigure}                                       % Use the environment defined earlier
    \centering                                            % Center contents
    \includegraphics[width=0.85\linewidth]{#1}            % Include the specified image
    \captionof{figure}{#2}                                % Print a caption
  \end{colfigure}
}

\pagestyle{fancy}                                         % Use fancy header style
\fancyhf{}
\rhead{Writing Documents with LaTeX, R, and Rmarkdown}    % Right text in header
\lhead{\thepage\enspace Example Documents}                % Left text in header

\begin{document}                                          % Begin document

\section{Introduction}                                    % Define section header
\blindtext[1]                                             % Generate lorem ipsum

\medskip                                                  % Blank space
\noindent\makebox[\linewidth]{                            % Add a horizontal line
  \rule{\textwidth}{0.2pt}
}

\begin{multicols}{2}                                      % Begin a two-column layout
                                                          % Using an environment called multicols
  \section{Background}                                    % Define a subsection
  \blindtext

  \section{Analysis}
  \blindtext

  \colgraphic{./xkcd-file-ext.png}{Here's a caption!}     % Use our command from earlier to add
  \bigskip                                                % an image

  \noindent\blindtext                                     % More lorem ipsum

\end{multicols}                                           % End multicols environment

\end{document}                                            % End document
tree --dirsfirst --charset=ascii -n ./
.
|-- example-twocolumn.tex
`-- xkcd-file-ext.png

Citations with BibTeX

BibTeX is a citation data format useful for generating automatic citations

  • Many, many different citation types (full list bibtex.com/e/entry-types)
  • Use any bibliography style (with a .csl file, which is XML)

Citation Databases/Files

Citations can be kept in a file with extension .bib

  • List of citation items with properties
  • Properties separated by commas
  • Strings wrapped in curly braces ({})
    • Maintain preformatted capitalization with nested {}

Some citations from my statistics midterm project:

@book{mishima,
  author = {Yukio Mishima},
  title = {Spring Snow},
  publisher = {Vintage International},
  date = {1990-04-14},
  ISBN = {0679722416},
}

@book{statmethods,
  title = {{NIST}/{SEMATECH} {e}-{Handbook} of Statistical Methods},
  url = {http://www.itl.nist.gov/div898/handbook},
  urldate = {2020-01-11},
}

@inbook{statmethodsheteroscedastic,
  author = {James J. Filliben and Alan Heckert},
  chapter = {1.3.3.26.9: Scatter Plot: Variation of Y Does Depend on X ({heteroscedastic})},
  crossref = {ref_engineering_statistics_handbook},
}

@manual{rpkgcar,
  author = {John Fox and Sanford Weisberg and Brad Price},
  title = {{car}: Companion to Applied Regression},
  note = {R package version 3.0.6},
  url = {https://CRAN.R-project.org/package=car},
  urldate = {2020-01-11},
}

@misc{hadamitzky,
  author = {Wolfgang Hadamitzky},
  title = {Romanization Systems},
  url = {https://www.hadamitzky.de/english/lp_romanization_sys.htm#07},
  urldate = {2020-01-09},
}

Citing in LaTeX with BibLaTeX

Citations can be autogenerated with the biblatex package:

\documentclass[11pt]{article}

\usepackage[
  margin=3cm,
  bottom=15cm,
]{geometry}
\usepackage{babel}
\usepackage{blindtext}
\usepackage[
  backend=biber,                        % Define backend, modern is mostly biber, older is bibtex
  style=verbose,                        % Set citation and bibliography style
  autocite=footnote,                    % Use footnote citations
]{biblatex}                             % Import biblatex package
\addbibresource{example-bib.bib}        % Add a bib file            

\begin{document}

\blindtext
\autocite{mishima}                      % Reference a citation
\autocite{statmethods}

\end{document}
xelatex example-bib.tex       # Initial run to create `.bcf` with citations from `.bib`
biber example-bib             # Generate `.bbl` citation file (note: no .bib extension specified)
xelatex example-bib.tex       # Use `.bbl` file in LaTeX
xelatex example-bib.tex       # Rerun to resolve cross references

MLA with BibLaTeX

MLA citations can be generated using the style=mla option:

\documentclass[11pt]{article}

\usepackage[
  margin=3cm,
  bottom=10cm,
]{geometry}
\usepackage{babel}
\usepackage{blindtext}
\usepackage[
  backend=biber,
  style=mla-new,
]{biblatex}                               % Use MLA 8 style; use `mla` for MLA 7
\addbibresource{example-bib-mla.bib}      % Same citation database as the last example

\begin{document}

\blindtext
Blah blah \autocite{mishima}.

\printbibliography                        % Print the bibliography

\end{document}
xelatex example-bib-mla.tex
biber example-bib-mla
xelatex example-bib-mla.tex
xelatex example-bib-mla.tex

Toning Down the Complexity

Sometimes you might not need that much control or complexity

  • Too many commands
  • Document setup is too complicated
  • Only need basic document or predefined document templates

Markdown

A simple markup language for writing documents that require simple formatting

  • Used almost universally in README files
    • GitHub has its own flavor: GitHub-Flavored Markdown
  • Comes in many different flavors; depends on the parser
  • Many parsers to transform markdown into HTML, PDF, DVI, etc.

An Example

Very basic syntax that should be unviersal across most markdown processors

# Example Document

This is an example markdown document showcasing some basic, universal
syntax.

## Sub-Header

Here's some more information. How about some **bolded** text? Use *this*
or *this* for italicization.

\^ A blank line to do line breaks for most parsers. This line is
technically just a continuation of the last.

## More Things

Insert a link: [some text placeholder](https://en.wikipedia.org/)

A list of items:

-   One
-   Two
-   Three
    -   Three and some more
-   Four

Things like tables usually differ across processors; some support, some don’t, syntax might be different

Rendering Markdown

There are many, many implementations of markdown, mostly targeting HTML:

Check differences with Babelmark: johnmacfarlane.net/babelmark2

Pandoc

Pandoc (pandoc.org) is a universal markup converter

  • Convert to and from tons of markup formats (see conversion diagram)
    • HTML, PDF, TeX, Wiki markups, presentation slides, Word docx, ODT, etc.
  • Has its own markdown flavor

Using Pandoc

We can use Pandoc to convert our markdown into HTML:

pandoc -s example-markdown.md -o example-markdown.html
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="" xml:lang="">
<head>
  <meta charset="utf-8" />
  <meta name="generator" content="pandoc" />
  <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes" />
  <title>example-markdown</title>
  <style>
      code{white-space: pre-wrap;}
      span.smallcaps{font-variant: small-caps;}
      span.underline{text-decoration: underline;}
      div.column{display: inline-block; vertical-align: top; width: 50%;}
  </style>
  <!--[if lt IE 9]>
    <script src="//cdnjs.cloudflare.com/ajax/libs/html5shiv/3.7.3/html5shiv-printshiv.min.js"></script>
  <![endif]-->
</head>
<body>
<h1 id="example-document">Example Document</h1>
<p>This is an example markdown document showcasing some basic, universal syntax.</p>
<h2 id="sub-header">Sub-Header</h2>
<p>Here’s some more information. How about some <strong>bolded</strong> text? Use <em>this</em> or <em>this</em> for italicization.</p>
<p>^ A blank line to do line breaks for most parsers. This line is technically just a continuation of the last.</p>
<h2 id="more-things">More Things</h2>
<p>Insert a link: <a href="https://en.wikipedia.org/">some text placeholder</a></p>
<p>A list of items:</p>
<ul>
<li>One</li>
<li>Two</li>
<li>Three
<ul>
<li>Three and some more</li>
</ul></li>
<li>Four</li>
</ul>
</body>
</html>

Converting to TeX

We can convert it to TeX and perhaps modify it before using pdflatex or xelatex:

pandoc example-markdown.md -o example-markdown.tex
\hypertarget{example-document}{%
\section{Example Document}\label{example-document}}

This is an example markdown document showcasing some basic, universal
syntax.

\hypertarget{sub-header}{%
\subsection{Sub-Header}\label{sub-header}}

Here's some more information. How about some \textbf{bolded} text? Use
\emph{this} or \emph{this} for italicization.

\^{} A blank line to do line breaks for most parsers. This line is
technically just a continuation of the last.

\hypertarget{more-things}{%
\subsection{More Things}\label{more-things}}

Insert a link: \href{https://en.wikipedia.org/}{some text placeholder}

A list of items:

\begin{itemize}
\tightlist
\item
  One
\item
  Two
\item
  Three

  \begin{itemize}
  \tightlist
  \item
    Three and some more
  \end{itemize}
\item
  Four
\end{itemize}

Converting to PDF

We can convert it to PDF:

  • Markdown is first converted to TeX
  • pdflatex, by default, converts to TeX to PDF
  • A TeX installation is required
pandoc example-markdown.md -o example-markdown.pdf

Passing Arguments to Pandoc

Pandoc can read in many properties to customize the output

The simplest way is to pass them in as command line arguments:

# Use xelatex instead of pdflatex
pandoc example-markdown.md -o example-markdown.pdf --pdf-engine=xelatex
# Table of contents, numbered sections, code highlighting color scheme
pandoc example-markdown.md -o example-markdown.pdf --toc --number-sections --highlight-style=pygments

Ahh! Command Line! Scary!

There don’t seem be any graphical interfaces dedicated to Pandoc, but you can try searching for Pandoc-related extensions for your favorite code/text editor.

Pandoc Markdown

Pandoc has its own flavor of markdown that greatly expands on upon basic syntax

  • Optionally enable and disable features when compiling

YAML Header

A header block of YAML can set properties for the file instead of passing them as command line arguments; for example:

  • PDF engine (pdflatex, xelatex, lualatex, etc.)
  • Template file to use
  • Table of contents and depth
  • Bibliography files and options
  • Math rendering method

There are many variables specific to certain types of outputs. See Pandoc documentation for full list.

For TeX:

  • Page geometry (geometry)
  • Document class (documentclass)
  • Paper size (papersize)
  • Font style (fontsize, mainfont, CJKmainfont, etc.)

Code Blocks

Delimit code blocks with triple backticks (```):


``` python
print("This is a code block with syntax highlighting")
```

We can also use tildes and give properties:

  • Block identifier with #identifier
  • Assign classes to use when rendering (.classname)
    • Many built in classes for specific highlighting of languages
    • .numberLines to show line numbers on the left

~~~{#some-rust-code .rust .numberLines}
fn Parse(val: Option<String>) -> String {
  match val {
    Some(v) -> v,
    None -> String::new()
  }
}
~~~

Tables

There are many different syntaxes that can be used to create tables

Pulled straight from Pandoc’s documentation:


  Right     Left     Center     Default
-------     ------ ----------   -------
     12     12        12            12
    123     123       123          123
      1     1          1             1

Table:  Demonstration of simple table syntax.

Using pipe syntax:


| Right | Left | Default | Center |
|------:|:-----|---------|:------:|
|   12  |  12  |    12   |    12  |
|  123  |  123 |   123   |   123  |
|    1  |    1 |     1   |     1  |

  : Demonstration of pipe table syntax.

Block Quotations


> insert a block quotation like this. This paragraph has two lines.
>
> 1.  A list item inside the block quote
> 2.  Another item
>
> > Nested block quotation

Math

Delimit LaTeX mathematical expressions with $:

Here's some regular text. Now, for a mathemtical expression:

$$\int_a^b \frac{1}{x} dx = \ln{b} - \ln{a}$$

Figure Placement and Captions

Pandoc will automatically put our caption text underneath the image with the prefix “Figure X”

  • X denotes that this is the Xth image in the document
Blah blah blah...

![We can put a caption here!](path/to/some/image)

Check out the Pandoc manual (https://pandoc.org/MANUAL.html) for more on Pandoc’s capabilities

Here’s an Example

---
geometry: 'margin=3cm, bottom=3cm'
---

# Pandoc Markdown

## Header {#header-identifier}

`\Blindtext{1}` from the `blindtext` package would produce something
like this[^1]:

> Lorem ipsum dolor sit amet, consectetur adipiscing elit. Etiam vel
> commodo lectus, eget pretium purus. Mauris eleifend mattis elit, nec
> maximus turpis lacinia a. ...

Very important data!:

| Year |  Total production|  \% Coal|  \% Petroleum|  \% Natural gas|  \% Other|
|:-----|-----------------:|--------:|-------------:|---------------:|---------:|
| 1960 |     41.5 quad Btu|     26.1|          36.0|            34.0|       3.9|
| 1970 |     62.1 quad Btu|     23.5|          32.9|            38.9|       4.7|
| 1980 |     64.8 quad Btu|     28.7|          28.2|            34.2|       8.9|

: Energy production by major source from 1960 to 1980

$$\int_a^b \frac{1}{x} dx = \ln{b} - \ln{a}$$

## Second Header

See [Header](#header-identifier).

![The spotted skunk does a handstand as its final spray
warning](skunk-warning.jpg){width="50%"}

[^1]: This is a footnote!
pandoc example-markdown-pandoc.md -o example-markdown-pandoc.pdf

Citations in Pandoc Markdown

Citation generation requires pandoc-citeproc, an external filter

  • Specify bibliography (.bib) files with bibliography in the header
  • Optionally, specify citation style with csl
  • Optionally, specify other citations with references
  • Square brackets to cite something
---
bibliography: 'example-bib.bib'
csl: chicago.csl
geometry: 'margin=3cm, bottom=15cm'
references:
- author:
  - family: Watson
    given: J. D.
  id: watson
  title: Molecular structure of nucleic acids
  type: article
---

Blah blah [see @statmethods]. Blah blah [@mishima, pp. 20-21] Blah blah
[@statmethodsheteroscedastic; @watson].
tree --dirsfirst --charset=ascii -n docs
.
|-- example-markdown-pandoc-bib.md
|-- example-bib.bib
`-- modern-language-association.csl
pandoc example-markdown-pandoc-bib.md --filter pandoc-citeproc -o example-markdown-pandoc-bib.pdf

One Step in Another Direction: R

R is programming language for statistical computing

  • Used widely by statisticians, data analysts, etc.
  • Interpreted and dynamically typed language

Installing R

Download R from the homepage of The R Project for Statistical Computing – r-project.org

  • Packages available for most Linux distributions
  • Install on Homebrew: brew install r
  • Install on Chocolatey: choco install r

Install R packages from the CRAN (Comprehensive R Archive Network)

R Session

R starts a basic R session

R code can be written in an .R file; run with Rscript file.R

Ahh! Command Line! Scary!

Some graphical R environments:

Some R Basics

Variable assignment is done with <- operator

s <- "Hello World!"
print(s)
## [1] "Hello World!"

Using packages outside of the standard library:

readr::read_csv("docs/file_hash.csv")
library(readr)
read_csv("docs/file_hash.csv")

Data Types

In R, everything is a vector – an array of same-type items

# c() is to concatenate items and form a vector
vec <- c(1, 5, 3.5, 7)

Three basic classes of vectors that you need to know for basic stuff:

  • Logical – boolean (TRUE, FALSE)
  • Numeric - double-precision decimals and integers (5, 7.7777, 1.234)
    • Converts readily between the two when necessary
  • Character - strings, characters ("z", "bad", "FALSE")
a <- 1.6    # a is of numeric class
b <- 5      # b is of numeric class
a + b       # An expression will write the result to standard output
## [1] 6.6
c <- a * b == 8   # c is of logical class
c
## [1] TRUE

Lone values are just vectors of length 1:

a <- 1.6
a == c(1.6)   # This is double =, though it looks like just one long =
## [1] TRUE

Vectors are 1-indexed:

vec <- c(1, 2, 3, 4, 5)
vec[1]
## [1] 1
vec[2]
## [1] 2
vec[0]  # Doesn't exist!
## numeric(0)
vec[6]  # Out of bounds!
## [1] NA

Iterate over vectors with for loops:

vec <- c("Cueball", "Megan", "Hairy", "Ponytail",
         "Black Hat", "Danish", "Beret Guy")
for (v in vec) {
  print(v)
}
## [1] "Cueball"
## [1] "Megan"
## [1] "Hairy"
## [1] "Ponytail"
## [1] "Black Hat"
## [1] "Danish"
## [1] "Beret Guy"
for (i in seq_along(vec)) {
  print(paste(i, ":", vec[i]), sep = " ")
}
## [1] "1 : Cueball"
## [1] "2 : Megan"
## [1] "3 : Hairy"
## [1] "4 : Ponytail"
## [1] "5 : Black Hat"
## [1] "6 : Danish"
## [1] "7 : Beret Guy"

Data Frames

A data frame is a two-dimensional array-like / table-like structure

  • Each column contains values of a single type
  • Each row has values for each column
people_df <- data.frame(
  id = c(1:5),
  name = c("A", "B", "C", "D", "E"),
  iq = c(81, 98, 113, 135, 162),
  stringsAsFactors = FALSE            # Disable string storage as factors
                                      # if you want to modify the strings
)

Access columns as vectors with the $ operator – data_frame$column_name:

people_df$name                        # The names column
## [1] "A" "B" "C" "D" "E"
people_df$iq                          # The iq column
## [1]  81  98 113 135 162

Access specific rows, columns, or cells with data_frame[row, colname]:

people_df[1, ]                        # First row
##   id name iq
## 1  1    A 81
people_df[, "id"]                     # The id column
## [1] 1 2 3 4 5
people_df[3, "iq"]                    # IQ of the person in the third row
## [1] 113

Reading in Data from Files

We can read in data from a CSV (comma-separated values) file into a data frame like so:

data <- read.csv("docs/file_hash.csv")

class(data)
## [1] "data.frame"
data
##     X                                Path                             Hash
## 1   1            docs/example-bib-mla.tex 2a57a72d7fce71b6923f8435e5a5d4de
## 2   2          docs/example-twocolumn.tex 59d29b2eea051d408bf6997d27a0337e
## 3   3        docs/example-markdown-pdf.md 372bac0c4bb56d39c82cfb42f9556b58
## 4   4        docs/example-markdown-tex.md 372bac0c4bb56d39c82cfb42f9556b58
## 5   5              docs/example-latex.tex 26137162c441c4c7c94904144d7f59e3
## 6   6            docs/example-markdown.md 372bac0c4bb56d39c82cfb42f9556b58
## 7   7            docs/example-bib-mla.bib b5d5948f9525abfa801fc26788417912
## 8   8                         styles.scss 7c04b04be83e97adbe3b0a49e6997642
## 9   9                docs/example-bib.tex 2f8d7b1e57a9d10b9042fb8f606feff9
## 10 10     docs/example-markdown-pandoc.md 93822f77a0b25747305eb3339de8897a
## 11 11 docs/example-markdown-pandoc-bib.md 16699d8dbf96a24f90a473b9780b2c66
## 12 12                docs/example-bib.bib b5d5948f9525abfa801fc26788417912

Plotting Data

R’s standard library provides many visualization options

  • hist plots a histogram
  • plot plots a scatter plot
  • boxplot plots a box and whisker plot
  • qqnorm plots a normal quantile plot

… and many, many more

  • ggplot2 package provides more functionality
data <- read.csv("docs/sample-data.csv")
df <- data.frame(score = data$Score, scored_by = data$ScoredBy)
df <- df[order(df$scored_by), ]

hist(df$score,                                  # Data to be plotted
     breaks = seq(3, 10, 0.25),                 # Bin limits; min of 3,
                                                # max of 10, 0.25 step size
     main = "This is the title of the plot",    # Plot title
     xlab = "Label of the x-axis",              # X-axis label
     ylab = "Label of the y-axis",              # Y-axis label
     col = "lightgrey"                          # Column shading color
)

plot(df$scored_by, df$score,
     main = "Number vs Some Other Number",
     xlab = "Number",
     ylab = "Number",
     cex = 0.6
)
fit <- lm(df$score ~ df$scored_by)
abline(fit)

R Markdown – At Last!

A flavor of markdown based on Pandoc markdown with extra excellent features

  • Embedding code chunks via knitr
  • Generation of interactive documents

It uses Pandoc at compilation

  • All Pandoc markdown syntax still works

Installing R Markdown

To be able to compile R markdown, the rmarkdown package is required:

install.packages("rmarkdown")

Compiling R Markdown

We can use rmarkdown (in an R session) to compile our Pandoc markdown file into a PDF just like before to produce the same result:

rmarkdown::render("example-markdown-pandoc.md")

Using knitr

knitr allows inserting code chunks, usually of R (we’ll discuss other options later), into markdown that will be evaluated at compile time

  • rmarkdown::render will automatically call knitr::knit to transform code chunks

To write a code chunk, create a markdown code block like so:

```{r}
a <- c(1, 3, 5, 7, 8)
a
```

Let’s put that in an R markdown file:

---
geometry: margin=3cm
output: pdf_document
---

Blah blah blah, here's an R code chunk:

```{r}
a <- c(1, 3, 5, 7, 8)
a
```

The results of all expressions (not statements) are displayed, as shown above.
rmarkdown::render("example-rmarkdown.Rmd")

Variables are shared between different chunks:

```{r}
scandinavia <- c("Denmark", "Norway", "Sweden")
```

```{r}
fennoscandia <- c(scandinavia, "Finland")
```

Inline R

Code can be embedded inline like so:

---
geometry: margin=3cm
output: pdf_document
---

```{r}
i <- 590982
```

Here's some text. Now, we want to put the number $i =$ `r i` in our text.

Chunk Options

Each chunk can receive local options that changes behavior and output

Conditional Evaluation/Display

All chunk option values are R expressions, so you can put any R expression that matches the type

eval – (logical or numeric) determines whether or not to execute the chunk; TRUE by default

```{r, eval = some_logical_variable_elsewhere}    # Nothing inside this will be evaluated
1 + 1
```

echo – (logical or numeric) whether or not to include the source code the chunk in the output; TRUE by default

```{r, echo = FALSE}        # This code will not be displayed
a <- 200 * 4
a                                 # But the output of this will
```

results – (character) if/how to display expression output; markup by default

```{r, results = 'hide'}
scandinavia <- c("Denmark", "Norway", "Sweden")
scandinavia                                       # Output of this will not be shown
```
---
geometry: margin=3cm
output: pdf_document
---

Blah blah blah...

```{r, eval = FALSE}
print("This won't be evaluated, but we can still see the code")
```

```{r, echo = FALSE}
print("The source code that produced this output isn't outputed")
```

```{r, results = 'hide'}
print("We can see the source code, but not the result")
```

Plotting

Graphical plots produced will automatically be inserted as figures in the output

There are lots of figure-specific chunk options

  • fig.cap can be used to set the image caption for the figure output
  • fig.width and fig.height set the figure height and width, in inches
  • out.width and out.height scale the outputted figure to the given dimensions
---
geometry: margin=3cm
output: pdf_document
---

# Lorem ipsum

As shown in Figure 1, we have come across some very groundbreaking findings.

Aliquam molestias quo distinctio id ipsa aut. Optio error et iure dolorem
ducimus velit aliquam. Inventore et aliquid facilis.

```{r, echo = FALSE, fig.cap = "Histogram of Some Data", fig.width = 4.25, fig.height = 3}
data <- read.csv("sample-data.csv")
df <- data.frame(score = data$Score, scored_by = data$ScoredBy)
df <- df[order(df$scored_by), ]

hist(df$score, breaks = seq(3, 10, 0.25),
     main = "", xlab = "Some Data", ylab = "Frequency",
     col = "lightgrey", cex.main = 0.6, cex.axis = 0.75, cex.lab = 0.75
)
```

Est et dolore illum modi. Est laudantium sint alias. Dolorem possimus rerum est
ut sed molestiae. Quo ut in numquam rerum non accusamus fugit. Sapiente
provident quod quia similique.

Et unde fuga sit vero magnam eaque mollitia dolorum. At et dolor tenetur
molestiae. Ut dolorem et iste omnis nesciunt facere accusantium. Soluta earum
voluptatem quas sint ut. Magni maxime et doloremque et est.
rmarkdown::render("example-rmarkdown-fig.Rmd")

Data Frames as Tables

Data frames can be outputted as tables with the knitr::kable function:

```{r}
df <- read.csv("some-data.csv")
knitr::kable(df, "pandoc",
             col.names = c("A", "B", "C", "D"),
             align = c("r", "l", "l", "l")
)
```

Other Engines

knitr is able to support embedding code chunks in many other languages:

names(knitr::knit_engines$get())
##  [1] "awk"       "bash"      "coffee"    "gawk"      "groovy"    "haskell"  
##  [7] "lein"      "mysql"     "node"      "octave"    "perl"      "psql"     
## [13] "Rscript"   "ruby"      "sas"       "scala"     "sed"       "sh"       
## [19] "stata"     "zsh"       "highlight" "Rcpp"      "tikz"      "dot"      
## [25] "c"         "fortran"   "fortran95" "asy"       "cat"       "asis"     
## [31] "stan"      "block"     "block2"    "js"        "css"       "sql"      
## [37] "go"        "python"    "julia"     "sass"      "scss"

Replace {r} with {language} to use that language’s engine

  • Some engines require external interpreters, such as Python

Using Python Instead of R

Here’s a Python example:

```{python}
s = 'Hello World!'
print(s)
```

You can even use libraries like matplotlib in Python to draw plots:

```{python}
import matplotlib.pyplot as plt
plt.plot([0, 2, 1, 4])
plt.show()
```

Other Possibilities

See bookdown.org/yihui/rmarkdown/language-engines.html for other examples in other languages

  • SQL - open a SQL database connection and make queries
  • Rcpp - compile C++ into R functions (not limited to knitr)
  • JavaScript - include JS to be executed within an HTML output
  • C - compile C functions to be called from R

github.com/yihui/knitr-examples for an even more samples

Rcpp Example

A fibonacci function written in C++, compiled, and called from R

```{Rcpp fibcpp}
// C++ code
#include <Rcpp.h>

// [[Rcpp::export]]
int fibonacci(const int x) {
  if (x == 0 || x == 1)
    return (x);
  return (fibonacci(x - 1)) + fibonacci(x - 2);
}
```

```{r fibtest, dependson = 'fibcpp'}
# R code
fibonacci(4L)
fibonacci(10L)
```
## [1] 3
## [1] 55

Sweave, RHTML, Rasciidoc

knitr allows code chunk embedding in markup languages other than markdown as well

  • Code chunk syntax differs a little bit for each format
  • Sweave - code chunks in LaTeX (.Rnw)
  • RHTML - code chunks in HTML (.Rhtml)
  • Rasciidoc - code chunks in asciidoc (.Rasciidoc)
<!DOCTYPE html>
<html>
  <body>
    <p>Blah blah, blah blah blah...</p>
    <!-- begin.rcode
      1 + 1
      plot(mpg~hp, mtcars)
      end.rcode-->
  </body>
</html>

Remember, Pandoc and R markdown aren’t just limited to PDF outputs

  • This reveal.js presentation was written with R markdown.

A Final Example

A final product might look something like this:

---
author: Eric Zhao
bibliography: references.bib
date: "`r format(Sys.time(), '%d %B %Y')`"
title: "**Two Variable Data Analysis**"

CJKmainfont: IPAPMincho
csl: chicago.csl
fontsize: "10pt"
geometry: margin=2cm
header-includes:
  - \usepackage{nonfloat}
  - \usepackage{multicol}
  - \newcommand{\hideFromPandoc}[1]{#1}
  - \hideFromPandoc{
      \let\Begin\begin
      \let\End\end
    }
  - \raggedcolumns
indent: True
linestretch: 1
link-citations: True
numbersections: True
output:
  pdf_document:
    df_print: kable
    fig_width: 6
    fig_height: 4
    fig_crop: False
    highlight: monochrome
    keep_tex: True
    latex_engine: xelatex
toc: True
---

```{r global_options, include=FALSE}
extrafont::loadfonts()
fulldata <- readr::read_csv("data.csv")
data <- readr::read_csv("sample.csv")
data <- data[order(data$Season), ]
par(mar = c(2, 3, 0, 3), oma = c(0, 0, 0, 0), family = "CM Roman CE")
score <- data$Score
season <- data$Season

knitr::knit_hooks$set(wrapf = function(before, options, envir) {
  if (before) {
    return("\\begin{minipage}{0.85\\columnwidth}\n\\mbox{}\n\\centering")
  } else {
    output <- vector(mode = "character", length = options$fig.num + 1)

    for (i in 1:options$fig.num) {
      output[i] <- sprintf("\\includegraphics{%s}\n\\figcaption{%s}",
                           knitr::fig_path(number = i), options$fig.cap)
    }

    output[i + 1] <- "\\end{minipage}\\bigskip"
    return(paste(output, collapse = ""))
  }
})
```

\medskip
\noindent\makebox[\linewidth]{\rule{\textwidth}{0.2pt}}

\Begin{multicols}{2}

# Introduction

This report attempts to analyze the correlation between time, measured by the number of
cours since the first cour of 1980, and the score of anime, gathered from the anime
database and community website, [MyAnimeList.net](https://myanimelist.net).

## Background

「アニメ」, romanized in the Hepburn system (a system for the romanization of Japanese
using the Latin alphabet [@ref_romanization_sys]) as "anime", is defined in the
Merriam-Webster English Dictionary as "a style of animation originating in Japan that is
characterized by stark colorful graphics" [@ref_anime_def]. Some scholars suggest that it
may not appropriate to refer to "anime" narrowly as an art style, but rather generally as
animation created by Japanese artists within a Japanese context
[@ref_anime_cross_culture]. In general usage, the word may be used to refer to a single
production, a plural number of productions, or the medium as a whole. In this study, the
word "show" may also be used to refer to a single anime production.

There is no single definition of which productions may be classified as "anime" because
the meaning of the word in the Japanese language describes all forms of animated media.
Some scholars suggest that the term can be used to refer not only to animation from Japan,
but also animated media from other nations. Others strictly limit the classification of
anime to only Japanese animation.

Discussion has been published regarding effect of anime consumption by international
audiences in amplifying understanding and consideration across cultures
[@ref_anime_cross_culture]. Price discusses how exposure to anime internationally has
sparked and furthered cross-cultural interest. With the expansion of the internet, access
and exposure to anime has increased, and many websites have developed as hubs for the
gathering of millions of viewers across the globe.

One such website, MyAnimeList.net (referred subsequently as simply "MyAnimeList"), is an
English-speaking anime and manga database and community founded in 2006 by Garrett Gyssler
[@ref_mal_founded]. The website maintains a database of professionally produced animation
from Japan, Korea, China, or a mixture of those three nations. Independent works may be
added if they pass certain qualifications. The database does not include live-action
productions, trailers or advertisements, foreign versions of anime, or titles that have
not been confirmed to exist [@ref_mal_db_guidelines]. The anime in the MyAnimeList are the
anime considered in this study.

The database contains a wide variety of information on each title, including synopses,
background information, the genres of entertainment it might be considered to belong to,
information about the producing studio(s) and domestic and international licensors,
official artwork, per-episode information.

A particular property of interest is the "season" the anime first premiered in. In
consideration of TV anime releases, the term "season" or "cour" may be used in reference
to approximately a single quarter, or thirteen weeks, of the year. The quarters correspond
in general timing and name to the four natural temperate seasons. Most TV anime follow the
schedule of weekly releases beginning on the first or second week of the season. The
Winter season corresponds with the months of January, February, and March; the Spring
season corresponds with April, May, and June; and so on. For example, Winter 2020 began
shortly after the New Year in the first week of January 2020. All one cour-long (usually
comprised of twelve episodes) will likely finish airing by the start of Spring 2020, which
will begin at the start of April 2020. Some shows may begin release in special premieres
before the start of the season, but since most of the episodes air within the season, the
show is considered to premiere in that season.

Registered users on the MyAnimeList platform may add anime that they have watched or plan
to watch on their personal list. Adding anime they have watched typically entails the
attachment of an arbitrary, unitless score to that particular title. This score must be an
integer from $1$ to $10$, inclusive, where $10$ typically means that the user regards the
show as a masterpiece, and $1$, that the user regards the show as part of the lowest tier
of anime. Users may also post a review for the show describing the rationale for the score
they assign. MyAnimeList calculates and displays an aggregate score for each anime, a
decimal value rounded to two places when displayed, based off user scores, the number of
users who have scored the title, and a number of other variables [@ref_mal_score_calc].

# Data

The population considered consists of all the anime in the MyAnimeList database that
satisfy the requirements outlined below. The Jikan API [@ref_jikan_api], a public,
unofficial MyAnimeList JSON API, was queried by a script to retrieve and automatically
filter anime data from MyAnimeList. The final population set has `r nrow(fulldata)`
titles.

The data set analyzed is a simple random sample of `r nrow(data)` anime selected from the
population. For these anime, the relationship between time, the explanatory variable, and
MyAnimeList aggregate score, the response variable, is examined.

## Format

Time is considered in units of seasons. In the data, the premiere season for each show is
represented as the number of seasons since Winter 1980. For example, Spring 2011 would be
represented by the number `r (4 * 2011 + 1) - (4 * 1980)`.

The aggregate score value is the same hundredths-rounded decimal displayed on MyAnimeList.

## Requirements

\noindent
**Aggregate score** The listing must be given an aggregate score. Listings with no
aggregate scores are filtered out. MyAnimeList omits aggregate scores for anime that fail
certain requirements [@ref_mal_score_calc]. These include shows that have not yet been
released or are rated R18+.[^1]

\noindent
**Airing status** Anime that have not finished airing are not included. Scores are likely
to fluctuate as the show is airing and usually stabilize after users input their final
scores.

\noindent
**Number of scores** The listing must have been scored by 1500 or more users. This is to
include only shows exposed to a sufficiently significant audience.

## Sampling Process

Each of the anime in the population was assigned an integer identifier from $0$ to
`r nrow(data)-1` in order of MyAnimeList rank, separate from the existing MyAnimeList
database identifier. `r nrow(data)` unique integers in the range [0,`r nrow(data)-1`] were
generated using a random number generator. Those anime whose assigned identifiers matched
the generated integers were chosen to be included in the sample. The sampling process was
performed in a script.[^2]

## Limitations

Consideration of the data as a real reflection of the MyAnimeList userbase's opinion is
subject to a number of limitations.

Firstly, only the opaque aggregate score value calculated by the MyAnimeList website is
considered. The value is a weighted combination of individual scores selected by different
users who may apply varied scoring scales and criteria.

Secondly, not all users have watched and scored all the anime. Because of licensing and
availability, as well as the simple question of time, it is essentially impossible to
assume so. Therefore, each the scores will represent only a subset of all the users on
MyAnimeList.

# Analysis

```{r consts, message = FALSE, echo = FALSE}
xlbl <- "Seasons since Winter 1980"
ylbl <- "Aggregate Score"
mainsize <- 0.8
lblsize <- 1.2
axissize <- 1.2
ptsize <- 0.6
```

## Scatter Plot

All data points in the sample are plotted (see Figure \ref{scatter}). As the season value
becomes greater, representing more recent seasons, the aggregate score values become more
varied.

```{r scatterplot, message = FALSE, echo = FALSE, wrapf = TRUE, fig.show = 'hide', fig.cap = "Aggregate Scores over Seasons Scatter \\label{scatter}"}
plot(x = season, y = score,
     ylim = c(floor(min(score)), 10),
     main = "", xlab = xlbl, ylab = ylbl,
     cex.main = mainsize, cex.lab = lblsize, cex.axis = axissize,
     cex = ptsize
)
```

\noindent Since there exists a direct relationship between how recent the season is and
the number of anime (see Figure \ref{seasonhisto}), this is rather unsurprising. There
does not appear to be any kind of strong relationship between the two variables.

```{r season_histogram, message = FALSE, echo = FALSE, wrapf = TRUE, fig.show = 'hide', fig.cap = "Frequency of Anime in Seasons \\label{seasonhisto}"}
season_range <- range(season)
hist(season,
     breaks = seq(0, season_range[2], season_range[2] / 30),
     main = "", xlab = xlbl, ylab = "Frequency",
     cex.main = mainsize, cex.lab = lblsize, cex.axis = axissize,
     cex = ptsize
)
```

## Linear Model

```{r regeq, message = FALSE, echo = FALSE}
regeq <- function(summary, x) {
  slope <- summary$coefficients[2]
  yint <- summary$coefficients[1]
  return(sprintf("$\\hat{y} = %s %.3f %s %s %.3f$",
                     if (slope < 0) "" else "+",
                     slope,
                     x,
                     if (yint < 0) "" else "+",
                     yint))
}

```

```{r lin_regression, message = FALSE, echo = FALSE}
linreg <- lm(score ~ season)
linreg_summary <- summary(linreg)

linreg_slope <- linreg_summary$coefficients[2]
linreg_yint <- linreg_summary$coefficients[1]
linreg_rsq <- linreg_summary$r.squared
linreg_r <- sqrt(linreg_rsq)

linreg_eq <- regeq(linreg_summary, "x")
```

A least-squares regression line is calculated and laid over the scatter plot (see Figure
\ref{scatterlinreg}). The LSRL is represented by the equation `r linreg_eq`, where
$\hat{y}$ represents the predicted aggregate score and $x$ represents the number of
seasons since Winter 1980. The model predicts the score for anime that premiere in Winter
1980 to be `r sprintf("$%.3f$", linreg_yint)`. For every increase in the number of seasons
since Winter 1980 by $1$, the LSRL states that the aggregate MyAnimeList score will
`r if (linreg_slope < 0) "decrease" else "increase"` by
`r sprintf("$%.3f$", abs(linreg_slope))`.

```{r scatterplot_regression, message = FALSE, echo = FALSE, wrapf = TRUE, fig.show = 'hide', fig.cap = "Aggregate Scores with Linear Model \\label{scatterlinreg}"}
plot(x = season, y = score,
     ylim = c(floor(min(score)), 10),
     main = "", xlab = xlbl, ylab = ylbl,
     cex.main = mainsize, cex.lab = lblsize, cex.axis = axissize,
     cex = ptsize
)
abline(linreg)
```

\noindent However, the $r$ value of the model is `r sprintf("$%.3f$", linreg_r)`,
suggesting a minimal linear relationship between the MyAnimeList aggregate score and the
number of seasons since Winter 1980 and the aggregate score. The $r^2$ value is
`r sprintf("$%.3f$", linreg_rsq)`, meaning that `r sprintf("$%.3f$", 100 * linreg_rsq)`
percent of the variation in the score can be explained by the LSRL between the number of
seasons since Winter 1980 and the aggregate score; the $r^2$ value indicates a very weak
linear relationship between the two variables.

The residual plot (see Figure \ref{residuallinreg}) also suggests a weak linear
relationship; the residuals are not in uniformly random scatter around the horizon.

```{r residual_linear, message = FALSE, echo = FALSE, wrapf = TRUE, fig.show = 'hide', fig.cap = "Residual Plot of Linear Model \\label{residuallinreg}"}
linreg_resid <- resid(linreg)
plot(x = season, y = linreg_resid,
     main = "", xlab = xlbl, ylab = "Residual",
     cex.main = mainsize, cex.lab = lblsize, cex.axis = axissize,
     cex = ptsize
)
abline(0, 0)
```
\noindent Thus, no strong linear relationship between the variables is concluded.

## Logarithmic Model

```{r, message = FALSE, echo = FALSE}
logreg <- lm(score ~ log(season))
logreg_summary <- summary(logreg)
logreg_slope <- logreg_summary$coefficients[2]
logreg_yint <- logreg_summary$coefficients[1]
logreg_rsq <- logreg_summary$r.squared
logreg_eq <- regeq(logreg_summary, "\\ln{x}")
```

In addition to the linear model, a logarithmic model, using the transformation
$x_{new} = \ln{x}$ was calculated for the data, where $x$ is the explanatory variable. The
model is represented by the equation `r logreg_eq` (see Figure \ref{scatterlogreg}), where
$\hat{y}$ is the predicted aggregate score, suggesting that with every increase in the
value of $\ln{x}$ by $1$, the value of $\hat{y}$
`r if (logreg_slope < 0) "decreases" else "increases"` by
`r sprintf("$%.3f$", abs(logreg_slope))`. It predicts that an anime released in Winter
1980 would have a score of `r sprintf("$%.3f$", logreg_yint)`.

The model has an $r$ value of `r sprintf("$%.3f$", sqrt(logreg_rsq))` and $r^2$ value of
`r sprintf("$%.3f$", logreg_rsq)`, meaning `r sprintf("$%.3f$", 100 * logreg_rsq)` percent
of the variation in the aggregate score can be explained by the LSRL between the season
value and the aggregate score.

```{r, message = FALSE, echo = FALSE, wrapf = TRUE, fig.show = 'hide', fig.cap = "Aggregate Scores with Logarithmic Model \\label{scatterlogreg}"}
plot(x = season, y = score,
     ylim = c(floor(min(score)), 10),
     main = "", xlab = xlbl, ylab = ylbl,
     cex.main = mainsize, cex.lab = lblsize, cex.axis = axissize,
     cex = ptsize
)

range <- range(season)
xvals <- seq(range[1], range[2], len = length(score))
lines(xvals, predict(logreg, newdata = data.frame(xvals)))
```

\noindent The residual plot is shown in Figure \ref{residlogreg}. The residual plots for
the linear model and logarithmic model are very similar. The correlation coefficients and
coefficients of determination for both models suggest that they are both weak models.

```{r, message = FALSE, echo = FALSE, wrapf = TRUE, fig.show = 'hide', fig.cap = "Residual Plot of Logarithmic Model \\label{residlogreg}"}
logreg_resid <- resid(logreg)
plot(x = season, y = logreg_resid,
     main = "", xlab = xlbl, ylab = "Residual",
     cex.main = mainsize, cex.lab = lblsize, cex.axis = axissize,
     cex = ptsize
)
abline(0, 0)
```

## Prediction

The correlation coefficients and coefficients of determination for the two models
considered suggest that the logarithmic model is a slightly worse fit for the data.

Problematically, the data exhibits heteroskedasticity, meaning the variation in the
response variable variable is nonconstant over the values of the explanatory variable
[@ref_heteroskedasticity]. In the case of the data, the variation in the aggregate score
values increases as the value for the number of seasons since Winter 1980 increases. As
evidenced in the scatter plot, small values of the latter variable result in small scatter
in the former variable, whereas large values of the latter variable yield large scatter in
the former variable.

The presence of heteroskedasticity can be verified using the Breusch-Pagan test
[@ref_rectify_heteroskedasticity]. The test was run using the `ncvTest()` function from
the R package `car` [@ref_r_pkg_car]. The output is shown below.

```{r ncvtest}
car::ncvTest(linreg)
```

\noindent The p-value result from the chi-squared test is less than $0.05$; thus, the null
hypothesis that the data is homoskedastic is rejected and heteroskedasticity is inferred.

Because of the heteroskedastic nature of the data, the predictive accuracy of the model
likely decreases as greater explanatory variable values are used to predict the aggregate
score for anime premiering in that season. Nonetheless, an attempt will be made at using
the linear model to produce both interpolative and extrapolative aggregate score
predictions.

### Interpolative Prediction

```{r, message = FALSE, echo = FALSE}
linreg_eval <- function(x) {
  linreg_yint + linreg_slope * x
}

inter_low_x <- 2 + 1995 * 4 - (1980 * 4)
inter_low_y <-  linreg_eval(inter_low_x)
```

An interpolative prediction for the score of an anime released in Summer 1995, with $x =$
`r sprintf("$%d$", inter_low_x)`, was made. The predicted aggregate score value is
`r regeq(linreg, sprintf("(%d)", inter_low_x))` $=$ `r sprintf("$%.3f$", inter_low_y)`,
which is not too far from the scores of many anime that aired that season, though that
cannot necessarily be attributed to the accuracy of the prediction.

### Extrapolative Prediction

```{r, message = FALSE, echo = FALSE}
astro_boy_x <- 1963 * 4 - (1980 * 4)
astro_boy_y <- linreg_eval(astro_boy_x)

future_x <- 2050 * 4 - (1980 * 4)
future_y <- linreg_eval(future_x)
```

Often miscredited as the first television anime series, *Tetsuwan Atom*, more commonly
known in the English-speaking community as *Astro Boy*, premiered on January 1, 1963
[@ref_astro_boy_miscredit]. With $x =$ `r sprintf("$%d$", astro_boy_x)`, the linear model
predicts that MyAnimeList aggregate score for *Tetsuwan Atom* to be
`r regeq(linreg, sprintf("(%d)", astro_boy_x))` $=$ `r sprintf("$%.3f$", astro_boy_y)`.
*Tetsuwan Atom* has in reality garnered a lower score on the website of $7.23$.

Another data point to extrapolate is perhaps the MyAnimeList aggregate score of some title
released in the future. With $x =$ `r sprintf("$%d$", future_x)`, the model predicts that
the aggregate score for some anime premiering in Winter 2050 to be
`r regeq(linreg, sprintf("(%d)", future_x))` $=$ `r sprintf("$%.3f$", future_y)`.
However, the shape of the scatter indicates that the scores for anime in Winter 2050 would
encompass a wide range of values.

## Conclusions

One goal of the study was to examine if users in the MyAnimeList community on average have
a higher opinion of older anime---those that might be considered "classics"---than more
recent releases. While the linear model suggests a negative trend in aggregate scores over
time, the overall change over time is minimal. Furthermore, it is difficult to attribute
this strictly to MyAnimeList users' preference for anime from the 1980's and 1990's that
many might consider as classics. The limitations of the data discussed previously apply.
Particularly, it is likely and significant that newer viewers of anime have not watched at
all or as many older anime. Another important factor is that anime has become and is
becoming more mainstream, and production of it, more widepsread (and therefore having
greater variation in quality) [@ref_anime_rec_ann]; thus it makes sense that the variation
in scores increases with time.

Besides the variation in MyAnimeList aggregate score being directly proportional to the
number of seasons since Winter 1980 anime premiere in, there not does appear to be any
kind of strong relationship between the values of the explanatory and response variables.

### Further Research Possibilities

The study provided an example of data where definitive conclusions are not easy to derive
from. Further research could be done to perhaps consider and capture the sentiment of
written user reviews and other factors to more accurately examine the change in user
perceptions of anime over time.

\End{multicols}

\newpage
\newgeometry{margin=4cm}

# Appendix

## Appendix A: Scripts {#appendix-a}

### Sampling Script

The sampling script is written in Go, an open-source, general-purpose programming language
created by developers at Google [@ref_golang].

\small
\setstretch{0.8}

~~~~~~{#sample-selector .go .numberLines}
package main

import (
  "bytes"
  "encoding/csv"
  "math/rand"
  "os"
  "strconv"
  "time"
)

func main() {
  file, err := os.Open("data.csv")
  if err != nil {
    panic(err)
  }
  defer file.Close()

  r := csv.NewReader(file)

  nfile, err := os.Create("sample.csv")
  if err != nil {
    panic(err)
  }
  w := csv.NewWriter(nfile)
  defer w.Flush()

  w.Write([]string{"ID", "Title", "Rank", "Score", "ScoredBy",
    "Popularity", "Members", "Favorites", "Season", "Episodes"})

  data, err := r.ReadAll()
  if err != nil {
    panic(err)
  }
  data = data[1:]

  seed := rand.NewSource(time.Now().UnixNano())
  rng := rand.New(seed)

  max := len(data)

  for i := 0; i < 2000; i++ {
    n := rng.Intn(max)
    row := data[n]

    if len(row[8]) != 5 {
      i--
    } else {
      parts := splitSubN(row[8], 4)
      year, err := strconv.Atoi(parts[0])
      if err != nil {
        panic(err)
    }
      sn, err := strconv.Atoi(parts[1])
      if err != nil {
        panic(err)
      }

      v := (4*year + sn - 1) - (4 * 1915)
      if v < 0 {
        i--
        continue
      }
      row[8] = strconv.Itoa(v)

      w.Write(row)
    }
  }
}

func splitSubN(s string, n int) []string {
  sub := ""
  subs := []string{}

  runes := bytes.Runes([]byte(s))
  l := len(runes)
  for i, r := range runes {
    sub = sub + string(r)
    if (i+1)%n == 0 {
      subs = append(subs, sub)
      sub = ""
    } else if (i + 1) == l {
      subs = append(subs, sub)
    }
  }

  return subs
}
~~~~~~

\normalsize
\setstretch{1}

## Appendix B: Sample Data {#appendix-b}

\setstretch{0.9}

```{r data_display, echo = FALSE, message = FALSE}
df <- data.frame(ID = data$ID, Title = data$Title,
                 Score = data$Score, Season = data$Season)
trim <- function(x) gsub("^\\s+|\\s+$", "", x)
df["Title"] <- unlist(Map({
  function(s) {
    if (nchar(as.character(trim(s))) > 53) {
      s <- paste(substr(trim(s), 0, 50), "...", sep = "")
    } else {
      s <- paste(s)
    }
    s
  }
}, df$Title))


knitr::kable(df, "pandoc",
             col.names = c("ID [^3]", "Title[^4]",
                           "Score[^5]", "Season [^6]"),
             align = c("r", "l", "l", "l")
)
```

[^1]: Otherwise, the data set has not been audited to remove
    potentially questionable titles.

[^2]: See [Appendix A](#appendix-a).

[^3]: MyAnimeList database identifier.

[^4]: Titles have been truncated to 50 characters.

[^5]: MyAnimeList aggregate score.

[^6]: Number of seasons since Winter 1980.

Source Code

The source code can be found on GitHub at github.com/Dophin2009/rmdpresentation